Beating Randomized Response on Incoherent Matrices 

Moritz Hardt* Aaron Roth^ 

November 3, 2011 



Abstract 

Computing accurate low rank approximations of large matrices is a fundamental data mining 
task. In many applications however the matrix contains sensitive information about individuals. 
In such case we would like to release a low rank approximation that satisfies a strong privacy 
guarantee such as differential privacy. Unfortunately, to date the best known algorithm for this 
task that satisfies differential privacy is based on naive input perturbation or randomized response: 
Each entry of the matrix is perturbed independently by a sufficiently large random noise variable, 
a low rank approximation is then computed on the resulting matrix. 

We give (the first) significant improvements in accuracy over randomized response under the 
natural and necessary assumption that the matrix has low coherence. Our algorithm is also very 
efficient and finds a constant rank approximation of an m x n matrix in time 0{mn). Note that 
even generating the noise matrix required for randomized response already requires time 0{mn). 
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1 Introduction 



Consider a large m x n matrix A in which rows correspond to individuals, columns correspond to 
movies, and the non-zero entry in A(i, j) represent the rating that individual / has given to movie j. 
Such a data set shares two important characteristics with many other data sets: 

1. It can be represented as a matrix with very different dimensions. There are many more people 
than movies, so n » m 

2. It is composed of sensitive information: the rating that an individual gives to a particular movie 
(and the very fact that he watched said movie) can be possibly compromising information. 

Nevertheless, although we want to reveal little about the existence of individual ratings in this data set, 
it might be extremely useful to be able to allow data analysts to mine such a matrix for statistical in- 
formation. Even while protecting the privacy of individual entries, it might still be possible to release 
another matrix that encodes a great deal of information about the original data set. For example, we 
might hope to be able to recover the cut structure of the corresponding rating graph, perform principal 
component analysis (PCA), or apply some other data mining technique. 

Indeed, this example is not merely theoretical. Data of exactly this form was released by Net- 
flix as part of their competition to design improved recommender systems. Spectral methods such 
as PCA were commonly used on this dataset, and privacy concerns were acknowledged: Netflix at- 
tempted to "anonymize" the dataset in an ad-hoc way. Following this supposedly anonymized release, 
Naranyanan and Shmatikov [NS08] were able to re-identify many individuals in the dataset by cross- 
referencing the reviews with publicly available reviews in the internet movie database. As a result 
of their work, a planned second Netflix challenge was canceled. The story need not have ended this 
way however - the formal privacy guarantee known as differential privacy could have prevented the 
attack of [NS08], and indeed, McSheiTy and Mironov [MM09] demonstrated that many of the recom- 
mender systems proposed in the competition could have been implemented in a diff"erentially private 
way. [MM09] make use of private low-rank matrix approximations using input perturbation methods. 
In fact, it is not possible to generically improve on input perturbation methods for all matrices with- 
out violating blatant non privacy [DN03]. Nevertheless, in this paper, we give the first algorithms for 
low rank matrix approximation with performance guarantees that are significantly better than input 
perturbation, under certain commonly satisfied conditions which are already assumed in prior work 
on non-private low-rank matrix approximation. 

In this paper, we consider the problem of privately releasing accurate low-rank approximations to 
datasets that can be represented as matrices. Such matrix approximations aie one of the most funda- 
mental building blocks for statistical analysis and data mining, with key applications including latent 
semantic indexing and principle component analysis. We provide theorems bounding the accuracy 
of our approximations as compared to the optimal low rank approximations in the Frobenius norm. 
The classical Eckart- Young theorem asserts that the optimal rank-^ approximation of a matrix A (in 
either the Frobenius or Spectral norms) is obtained by computing the singular value decomposition 
A = U1,V^, and releasing the truncated SVD A^ = VLkV^, where in 2^, all but the top k singular val- 
ues have been zeroed out. Computing the SVD of a matrix takes time 0{mn^). In addition to offering 
privacy guarantees, our algorithm is also extremely efficient: it requires only elementary matrix op- 
erations and simple noisy perturbations, and for constant k takes time only 0{mn). This represents a 
happy confluence of the two goals of privacy and efficiency. Normally, the two are at odds, and differ- 
entially private algorithms tend to be (much) less efficient than their non-private counterparts. In this 



2 



case, however, we will see that some algorithms for fast approximate low-rank matrix approximation 
are much more amenable to a private implementation than their slower counterparts. 

Computing low rank matrix approximations privately has been considered at least since [BDMN05], 
and to date, no algorithm has improved over simple input perturbation, which achieves an error (when 
compared with the best rank k approximation Ayt) in Frobenius norm of 0( ^lk{n + m)). Although this 
eiTor is optimal without making any assumptions on the matrix, this error can be prohibitive when the 
best rank k approximation is actually very good: when ||A - A)t||f «; V^(« + That is, exactly in the 
case when a low rank approximation to the matrix would be most useful. We give an algorithm which 
improves over input perturbation under the conditions that m <^ n and that the coherence of the matrix 
is small: roughly, that no single row of the matrix is too significantly con^elated with any of the right 
singular vectors of the matrix. Equivalently, no left singular vector has large correlation with one of 
the standard basis vectors. Low coherence is a commonly studied and satisfied condition. For exam- 
ple, Candes and Tao, motivated by the same Netflix Prize dataset re-identified by [NS08], consider 
the problem of matrix completion under low coherence conditions [CTIO]. They show that matrix 
completion is possible under low coherence assumptions, and that several reasonable random matrix 
models exhibit a strong incoherence property. Notably, [CTIO] were not concerned with privacy at 
all: they viewed low coherence as a natural assumption satisfied for datasets resembling the Netflix 
prize data that could be leveraged to obtain stronger utility guarantees. This represents a second 
happy confluence of the goals of data privacy and utility: low coherence is an assumption that others 
already make free of privacy concerns in order to improve the state of the art in data analysis. We 
show that the same assumption can simultaneously be leveraged for data privacy. In retrospect, low 
coherence is also an extremely natural condition in the context of privacy, although one that has not 
previously been considered in the literature. If a matrix fails to have low coherence, then intuitively 
the data of individual rows of the matrix is encoded closely in individual singular vectors. If it does 
have low coherence, no small set of singular vectors can be used to encode any row of the matrix with 
high accuracy, and intuitively, low rank approximations reveal less local information about particular 
entries of the matrix. 

The problem we solve is the following: Given a matrix A and a target rank k we privately compute 
and release a rank 0{k) matrix B such that ||A - B\\f is not much larger than ||A - , where is the 
optimal rank k approximation to A, and || • \\f is the Frobenius norm. The quality of the approximation 
depends on several factors, including n, m, the desired rank k, and the coherence of the matrix. Our 
approach improves over input perturbation when the matrix coherence is small. 

Our algorithm promises (s, 6)-differential privacy [DMNS06] with respect to changes of any sin- 
gle row of magnitude 1 in the /'2-norm. This is only stronger than the standard notion of changing 
any single entry in the matrix by a unit amount. In the very special case of the matrix representing 
a (possibly unbalanced) graph, this captures (for example) the addition or removal of a single edge. 
Therefore in this case our algorithm is promising edge privacy rather than vertex privacy. From a 
privacy point of view, this is less desirable than vertex privacy, but is still a strong guarantee which is 
appropriate in many settings. We note that edge privacy is well studied with respect to graph problems 
(see, e.g. [NRS07, GLM"''10, GRUll]), and we do not know of any algorithms with non-trivial guar- 
antees on graphs that promise vertex privacy, nor any algorithms in the more general case of matrices 
that promise privacy with respect to entire rows. 



3 



1.1 Our results 



We start with our first algoritiim tiiat improves over randomized response on matrices of small C- 
coherence. We say that anrnxn matrix A has coherence C, if no row has Euclidean norm more than 
C • ||A||f / ^/m, i.e., more than C times the the typical row norm.This parameter varies between 1 and 
since no row can have Euclidean norm more than \\A\\f. Intuitively the condition says that no 
single row contributes too significantly to the Frobenius norm of the matrix. 

Theorem 1.1 (Informal version of Theorem 6.2). There is an {s,6)-differentiaUy private algorithm 
which given a matrix A € R'"^" of coherence C such that n^ m computes a rank Ik matrix B such 
that with probability 9/10, 



Moreover, the algorithm runs in time 0{kmn). 

Hidden in the Oe^^-notation is a factor of 0{log{k / d) / e) that depends on the privacy parameters. 
Usually, 6 «; l/k so that log{k/d) < 21og(l/5). To understand the error bound note that the first term 
is proportional to the best possible approximation error ||A - A^Wf of any rank k approximation. In 
particular, this term is optimal up to constant factors. The second term expresses a more interesting 
phenomenon. Recall that we assume n » m so that ^/kn would usually dominate ^/kin except that 
the the Vkn term is multiphed by a factor which can be very small if the matrix has low coherence 
and is not too dense. For example, when k = 0(1), C = 0(1) and \\A\\f = 0( ^Jn), the error is roughly 
0( ^/m + ^Jn/m^^^) which can be as small as 0(n^^^) depending on the magnitude of m. However, 
already in a much wider range of parameters we observe an error of o( Vkn). In fact, in Section C we 
illustrate why the Netflix data satisfies the assumptions made here and why they are likely to hold in 
other recommender systems. 

When \\A\\f > ^/n, the previous theorem cannot improve on randomized response by more than 
a factor of 0(m^^^). Our next theorem uses a stronger but standard notion of coherence known as 
jUo-coherence. We defer a formal definition of yUo-coherence to Section 5, but we remark that this 
parameter varies between 1 and m. Using this notion we are able to obtain improvements roughly of 
order 0( ^Jm). 

Theorem 1.2 (Informal version of Theorem 6.3). There is an (e,6)-dijferentially private algorithm 
which given a matrix A € R'"^" with n ^ m and of i^Q-coherence n and rank r ^ 2k computes a rank 
2k matrix B such that with probability 9/10, 



Moreover, the algorithm runs in time 0(kmn). 

The hidden factor here is the same as before. Note that when pkr = polylog(?i), the theorem 
can lead to an eiTor bound o{n^^'^^ depending on the magnitude of m. Note that this is roughly the 
square root of what randomized response would give. But again under much milder assumptions on 
the coherence, the en^or remains oi^^fknj . Notably, Candes and Tao [CTIO] work with a stronger 
incoherence assumption than what is needed here. Nevertheless they show that even their stronger 
assumption is satisfied in a number of reasonable random matrix models. A slight disadvantage of 



||A - B\\f < O (\\A - AkWp) + O^^s + • 
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the error bound in Theorem 1.2 is that the actual rank r of the matrix enters the picture. Theorem 1.2 
hence cannot improve over Theorem 1.1 when the matrix has very large rank. We do not know if the 
dependence on r in the above bound is inherent or rather an artifact of our analysis. 

Finally, we remark that while our result depends on the //Q-coherence of the input matrix, our 
algorithm does not require knowledge or estimation of the yUo -coherence of the input matrix. The only 
parameters provided to the algorithm are the target rank and the privacy parameters. 

Reconstruction attacks and tightness of our results. As it turns out, existing work on "blatant 
non-privacy" and reconstruction attacks [DN03] demonstrates that our results are essentially tight 
under the given assumptions. To draw this connection, let us first observe why input perturbation 
cannot be improved without any assumption on the matrix. To be more precise, by input perturbation 
we refer to the method which simply perturbs each entry of the matrix with independent Gaussian 
noise of magnitude o{s~^ ^\og{l 1 6)^, which is sufficient to achieve (e, (5)-differential privacy with 
respect to unit €2 perturbations of the entire matrix. To obtain a rank k approximation to the original 
matrix, one can then simply compute the exactly optimal rank k approximation to the perturbed matrix 
using the singular value decomposition, which as one can show introduces error O^^s ( ^Jkm + yfknj 
compared to the optimal rank k approximation to the original matrix in the Frobenius norm. First, 
let us observe that it is not possible in general to have an algorithm which guarantees eiTor in the 
Frobenius norm of o( Vkn) for every matrix A, without violating blatant non-privacy^ , as defined by 
[DN03]. This is because there is a simple reduction which starts with an (e, 5)-difFerentially private 
algorithm for computing rank k approximations to matrices A e R*"^" and gives an (e, 5)-differentially 
private algorithm which can be used to reconstruct almost every entry in any database D e {0, 1)" 
for n' = k ■ n. It is known that (e, (5)-private mechanisms do not admit such reconstruction attacks, 
and so the result is a lower bound. The reduction follows from the fact that we can always encode a 
bit- valued database D € {0, 1)" for n' - k ■ n as k rows of an m x « matrix for any m ^ k, simply 
by zeroing out all additional m - k rows. Note that the resulting matrix only has rank k, and so the 
optimal rank k approximation to this matrix has zero eiTor. If we could recover a matrix A' such that 
\\A - A'Wp - o{ Vkn), this would mean that for a typical nonzero row A, of the matrix with / € [k], 
we would have - A'.\\2 = o{^Jn), and - A'^\\\ < o{n). Then, by simply rounding the entries, 
we could reconstruct the original database D in almost all of its entries, giving blatant non-privacy as 
defined by [DN03]. 

What is happening in the above example? Intuitively, the problem is that in the rank k matrix we 
construct from D, the k nonzero rows of the matrix are encoded accurately by only k right singular 
vectors. On the other hand, low coherence implies that any k right singular vectors poorly represent a 
set of only k rows. Hence, there is hope to circumvent the above impediment using a low coherence 
assumption on the matrix. Indeed, this is precisely what Theorem 1.1 and Theorem 1.2 demonstrate. 
Nevertheless, reconstruction attacks still lead to lower bounds even under low coherence assumptions. 
Indeed, using the above ideas, the next proposition shows that Theorem 1.1 is essentially tight up to 
a factor of 0{ Vic). Since in many applications k - 0(1), this discrepancy between our upper bound 
and the lower bound is often insignificant. 

Proposition 1.3. Any algorithm which given an m x n matrix A of coherence C outputs a rank k 

'An algorithm M is blatantly non-private if for every database D 6 jO, 1 j" it is possible to reconstruct a 1 -o(l) fraction 
of the entries of D exactly, given only the output of the mechanism M(D). 
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matrix B such that with high probability 



\\A-B\\f < o ikn-- — -— 
cannot satisfy (e, 6)-differential privacy for sufficiently small constants s, 6. 

Informal proof. For the sake of contradiction, suppose there exists such an algorithm At that satisfies 
(e, (5)-differential privacy. Then consider a randomized algorithm M' : {0, 1 )" R" which takes a 
data set D e {0, 1)" containing a sensitive bit for n' = kn individuals and encodes it as the mxn matrix 
Ad which contains D in its first k rows and is everywhere else. M'{D) then computes A1(A/)) and 
outputs the projection of M{Ad) onto the first k rows (thought of as a vector of length n' - kn). 

We claim that M' is (£, (5)-differentially privacy. This is because the map from D to is sensi- 
tivity preserving and the post-processing computed on M{Ad) preserves (£, (5)-difFerential privacy of 
M. 

On the other hand, we claim that M' is blatantly non-private. To see this note that the matrix A o 
has coherence C - ^jmjk and < ^^kn so that one can check that \\A - M(A)\\f < o( Vkn) with 
high probability. This implies that \\D - A1'(D)||2 < o{ ^/n') with high probability. We therefore also 
have \\D - Al'(D)||i < o{n'). But in this case we can compute a data set D' from the output of M'{D) 
such that \\D - D'\\q - o{n') by rounding. This is the definition of a reconstruction attack showing that 
M' is blatantly non-private. Since (£, (5)-differential privacy is known to prevent blatant non-privacy^ 
for sufficiently small e, 5 > 0, this presents the contradiction we sought. ■ 

A similar proof shows that error o( ^fn■ yjulm) (where is the /iQ-coherence of the matrix) cannot 
be achieved with (e, 5)-differential privacy. This shows that also Theorem 1.2 is tight up to the exact 
dependence on k and r. We leave it as an intriguing open problem to determine the exact interplay 
between coherence and the other parameters. 

1.2 Techniques and proof overview 

Our algorithm is based on a random-projection algorithm of Halko, Martinsson and Tropp [HMTl 1], 
which involves two steps: range finding and projection. The range finding algorithm first computes 
k Gaussian measurements of A, which we denote hy Y - Afl. Here, A is m x n and Q is /i x k. 
These measurements can be thought of as a random projection of the matrix into a lower dimensional 
representation, i.e., Y is m x k. The crux of the analysis in [HMTll] is in arguing that Y already 
captures most of the range of A. Hence, all that remains to be done is to compute the orthonormal 
projection operator Py into the span of Y, and to compute the projection PyA. Note that PyA is now 
a /^-dimensional approximation of A and since Y closely approximated the range of A, it must be a 
good approximation, say, in the Frobenius norm. 

The motivation of [HMTll] was to obtain a fast low rank approximation algorithm. Indeed, 
[HMTl 1] give a detailed theoretical analysis and empirical evaluation of the algorithm's performance. 

Step 1: Privacy preserving range finder and projection. We will leverage the algorithm of [HMTl 1] 
to obtain improved accuracy bounds in the privacy setting. As a first step, we need to be able to carry 
out the range finding and projection step in a privacy preserving manner. Our analysis proceeds by 
observing that the projection of A to F approximately preserves all of the £2 row-norms of A, and so 

"See, e.g., the proof of Theorem 4.1 in [Dell]. 
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we can apply a Gaussian perturbation to Y, rather than to A. (An mxk standard Gaussian matrix has 
Frobenius norm 0{ ^/km), which is now independent of n). The formal presentation of this part of the 
argument appears in Section 4. This step provides an approximation to the range of A which might 
already be useful for some applications, but has not yet achieved our goal of computing a low rank 
approximation to A itself. For this, we need the projection step discussed next. 

Step 2: Controlling the projection matrix using low coherence. We then show that under our 
low-coherence assumption on A, the entries of the projection matrix into the range of Y, Py, must be 
small in magnitude. Finally, when Py has small entries, the final projection step, of computing PyA 
has low sensitivity, and although we must now again add a Gaussian perturbation of dimension m x n, 
the magnitude of the perturbation in each entry can be smaller than would have been necessary under 
naive input perturbation. 

In order to obtain bounds on the ^c»-norm of the projection operator we make crucial use of the 
low-coherence assumption. Here we describe the proof strategy that leads to Theorem 1.2. Theorem 1.1 
is somewhat easier to show and follows along similar lines. The first observation is that the Gaussian 
measurements taken by the range finding algorithm are mostly linear combinations of the top left 
singular vectors of the matrix. But when the matrix A has low coherence, then its top left singular 
vectors must have very small con^elation with the standard basis. This means that the top singular 
vectors must have small coordinates. As a result each of the Gaussian measurements we take must 
have small ^oo-norm relative to the magnitude of the measurement. Some complications arise as we 
must add noise to the matrix Y for privacy reasons and then orthonormalize it using the Gram-Schmidt 
orthonormalization algorithm. A key observation is that the noise matrix is generated independently 
of Y. As a result, it must be the case that all columns of the noise matrix have very small inner product 
with the columns of Y. A careful technical argument uses this observation in order to show that the 
effect of noise can be controlled throughout the Gram-Schmidt orthonormalization. The result is a 
projection matrix in which the magnitude of each entry is small whenever the coherence of A was 
small to begin with. 

The exact proof strategy depends on the notion of coherence that we work with. Both notions 
we consider in this paper are presented and analyzed in Section 5. We then also show that small 
//Q-coherence is indeed a stronger assumption than small C-coherence. 

1.3 Related Work 
1.3.1 Differential Privacy 

We use as our privacy solution concept the by now standard notion of differential privacy, developed 
in a series of papers [BDMN05, CDM+05, DMNS06], and first defined by Dwork, McSherry, Nis- 
sim, and Smith [DMNS06]. The problem of privately computing low -rank approximations to matrix 
valued data was one of the first problems studied in the differential privacy literature, first considered 
by Blum, Dwork, McSherry, and Nissim [BDMN05], who give an input perturbation based algorithm 
for computing the singular value decomposition by directly computing the eigenvector decomposi- 
tion of a perturbed covariance matrix. Computing low rank approximations is an extremely useful 
primitive for differentially private algorithms, and indeed, McSherry and Mironov [MM09] used the 
algorithm given in [BDMN05] in order to implement and evaluate differentially private versions of 
recommendation algorithms from the Netflix prize competition. 
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Finding differentially private low-rank approximation algorithms with superior theoretical per- 
formance guarantees to input perturbation methods has remained an open problem. Beating input 
perturbation methods for arbitrary symmetric matrices was recently explicitly proposed as an open 
problem in [GRUll], who showed that such algorithms would lead to the first efficient algorithm 
for privately releasing synthetic data useful for graph cuts which improves over simple randomized 
response. Our work does not resolve this open question because our results only improve over in- 
put perturbation methods for matrices with unbalanced dimensions which satisfy a low-coherence 
assumption, but is the first algorithm to improve over [BDMN05] under any condition. 

Comparison to recent results of Kapralov, McSherry and Talwar. In a recent independent and 
simultaneous work, Kapralov, McSherry, and Talwar [KMT 11] give a new polynomial-time algo- 
rithm for computing privacy-preserving rank 1 approximations to symmetric, positive-semidefinite 
matrices. Their algorithm achieves (e, 0)-differential privacy under unit spectral norm perturbations 
to the matrix. Their algorithm outputs a vector v such that for all a > 0, E[t;^Af;] > (1 - a)||A|| - 
0{n\og{\ I a) I {so)) (where || • || denotes the spectral norm) and they show that this is nearly tight for 
(e, 0)-differential privacy guarantees. Our results are therefore strictly incomparable. In this work, 
the goal is to achieve error o( yfkn) (i.e. o( ^|n) for rank-1 approximations) assuming low coherence, 
(a stronger enw bound) under (e, 5)-differential privacy (a weaker privacy guarantee) and without 
making any assumptions about symmetry or positive-semidefiniteness. 

1.3.2 Fast Computation of Low Rank Matrix Approximations 

There is also an extensive literature on randomized algorithms for computing approximately opti- 
mal low rank matrix approximations, motivated by improving the running time of the exact singular 
value decompositions. This literature originated with the work of Papadimitriou et al [PTRV98] and 
Frieze, Kannan, and Vempala [FKV04], who gave algorithms based on random projections and col- 
umn sampling (in both cases with the goal of decreasing the dimension of the matrix). Achlioptas and 
McSherry [AMOl] give fast algorithms for computing low rank approximations based on randomly 
perturbing the original matrix (which can be done to induce sparsity). Although [AMOl] pre-dated 
the privacy literature, some of the algorithms presented in it can be viewed as privacy preserving, 
because perturbing the actual matrix with appropriately scaled Gaussian noise is a privacy preserving 
operation sometimes referred to as randomized response. When appropriately scaled (for privacy) 
Gaussian noise is added to an m x n matrix, it results in an algorithm for approximating the best rank 
k approximation up to an additive error of 0{ s/kQn + n)) in the Frobenius norm. 

Our algorithms are most closely related to the very recent work of Halko, Martinsson, and Tropp 
[HMTl 1], who give fast algorithms for computing low rank approximations based on two steps: range 
finding, and projection. As already discussed, in the first step, these algorithms project the matrix A 
into an mx k matrix Y which approximately captures the range of A. Then A is projected into the 
range of Y, which yields a rank k matrix which gives a good approximation to A if a good rank-^ 
approximation exists. We will further discuss the algorithm of [HMTll] and our modifications in the 
course of the paper. 

1.3.3 Low Coherence Conditions 

Low coherence conditions have been recently studied in a number of papers for a number of matrix 
problems, and is a commonly satisfied condition on matrices. Recently, Candes and Recht [CR09] 



8 



and Candes and Tao [CTIO] considered the problem of matrix completion. Matrix completion is the 
problem of recovering all entries of a matrix from which only a subset of the entries which have been 
randomly sampled. This problem is inspired by the Netflix prize recommendation problem, in which 
a matrix is given, with individuals on the rows, movies on the columns, and in which the matrix entries 
con^espond to individual movie ratings. The matrix provides only a small number of movie ratings per 
individual, and the challenge is to predict the missing entries in the matrix. Clearly accurate matrix 
completion is impossible for arbitrary matrices, but [CR09, CTIO] show the remarkable result that it 
is possible under low coherence assumptions. Candes and Tao [CTIO] also show that almost every 
matrix satisfies a low coherence condition, in the sense that randomly generated matrices will be low 
coherence with extremely high probability. 

Talwalkar and Rostamizadeh recently used low-coherence assumptions for the problem of (non- 
private) low -rank matrix approximation [TRIO]. A common heuristic for speeding the computation 
of low-rank matrix approximations is to compute on only a small randomly chosen subset of the 
columns, rather than on the entire matrix. [TRIO] showed that under low-coherence assumptions 
similar to those of [CR09, CTIO], the spectrum of a matrix is in fact well approximated by a small 
number of randomly sampled columns, and give formal guarantees on the approximation quality of 
the sampling based Nystrom method of low-rank matrix approximation. 
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2 Preliminaries 

We view our dataset as a real valued matrix A e R'"^". We sometimes denote the /-th of a matrix by 
A(,-). Let 

N = {P e R"""' : there exists an index / e [m] such that ||P(,)||2 < 1 and ||P(y)||2 = for all ; i) 

(1) 

denote the set of matrices that take at all values, except possibly in a single row, which has Euclidean 
norm at most 1 . 

Definition 2.1. We say that two matrices A, A' e R"'^" are neighboring if (A - A') e N. 

We use the by now standard privacy solution concept of differential privacy: 

Definition 2.2. An algorithm M : R"'^" R (where R is some arbitrary abstract range) is (e, 6)- 
differentially private if for all pairs of neighboring databases A, A' e R'"^", and for all subsets of the 
range S Q R: 

Pr {M(A) e 5 1 < exp(e) Pr {M(A') eS} + 6 
We make use of the following useful facts about differential privacy. 
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Fact 2.3. If M : R*"^" — > R is (e, 5)-differentially private, and M' . R ^ R' is an arbitrary random- 
ized algorithm mapping R to R', then M'{M{-)) : R'"^" R' is (e, 5)-dijferentially private. 

The following useful theorem of Dwork, Rothblum, and Vadhan tells us how differential privacy 
guarantees compose. 

Theorem 2.4 (Composition [DRVIO]). Let s,6 e (0,1), 5' > 0. If My, . . . ,Mk are each {s,6)- 
differentially private algorithms, then the algorithm M{A) = (Mi (A), . . . ,M^(A)) releasing the con- 
catenation of the results of each algorithm is {ks,k5)-differentially private. It is also {s',kS + S')- 
differentially private for: 

s < yl2k\n{ll6')e + 2ks^ 

We denote the 1 -dimensional Gaussian distribution of mean a and variance cr by Myu,cr2). We 
use N(}i, cr^Y to denote the distribution over (i-dimensional vectors with i.i.d. coordinates sampled 
from N(ju,cr^). We write X ~ D to indicate that a variable X is distributed according to a distribu- 
tion D. We note the following useful fact about the Gaussian distribution. 

Fact 2.5. Ifgi ~ Nipn, (t% then Z -N(Zifii, (r-) ■ 

The following theorem is well known folklore. We include a proof in the appendix for complete- 
ness. 

Theorem 2.6 (Gaussian Mechanism). Let x,y e R'' be any two vectors such that \\x - y\\2 < c. Let 
Y e R'' be an independent random draw from N{0,p^Y> where p = ce' 1 Vlog(1.25M). Then for any 
5 c R^ : 

Pr {.x: 7 € 5 1 < exp(£) Pr[i/ + Y eS]+6 

Vector and matrix norms. We denote by || • \\p the fp-norm of a vector and sometimes use || • || as a 
shorthand for the Euclidean norm. Given a real mxn matrix A, we will work with the spectral norm 
\\A\\2 and the Frobenius norm \\A\\f defined as 

IIAII2 - max||,||=i IIA^II and ||A||f = ^Zg-aJ- (2) 

For any m x n matrix A of rank r we have ||A||2 < ||A||f < ^fr ■ \\A\\2 . For a matrix Y we denote by Py 
the orthonormal projection operator onto the range of Y. 

Fact 2.7. Py = Y{Y*Y)Y''^ 

Fact 2.8 (Submultiplicativity). For any mxn matrix A and nxr matrix B we have \\AB\\f < ||A||/r -HBHf . 

Theorem 2.9 (Weyl). For any mxn matrices A,E, we have |cr,(A + E) - o",(A)| < ||£'||2 , where 
cri{M) denotes the i-th singular value of a matrix M. where o-i{M) denotes the i-th singular value of 
a matrix M. 

3 Low-rank approximation via Gaussian measurements 

We will begin by presenting an algorithm of Halko, Martinsson and Tropp [HMTll] as described in 
Figure 1 . The algorithm produces a rank r + p approximation that already for p ^ 2 closely matches 
the best rank r approximaton of the matrix in Frobenius norm. The guarantees of the algorithm are 
detailed in Theorem 3.1. 
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Input: Matrix A e R'"^", target rank r, oversampling parameter p. 

1. Range finder: Let Q be an n X standard Gaussian matrix where k = p + r. Compute the nxk 
measurement matrix Y = AQ.. Compute the orthonormal projection operator Py. 

2. Projection: Compute the projection B = PyA. 
Output: Matrix B of rank k. 



Theorem 3.1 ([HMTl 1]). Suppose that A is a real mxn matrix with singular values cr\ ^ o"2 ^ . . . . 
Choose a target rank r ^ 2 and an oversampling parameter p^l where r + p ^ minjm, n}. Draw an 
nx{r + p) standard Gaussian matrix Q., construct the sample matrix Y = AQ. and let B - PyA. Then 
the expected approximation error in Frobenius norm satisfies 



When applying the the theorem we will use Markov's inequality to argue that the error bounds 
hold with sufficiently high probability up to a constant factor loss. As shown in [HMTll], much 
better bounds on the failure probability are possible. We will omit the precise bounds here for the 
sake of simplicity. 

4 Privacy-preserving sub-routines: Range finder and projection 

In order to give a privacy preserving valiant of the above algorithm, we will first need to carefully 
bound the sensitivity of the range finder and of the projection step, and bound the effect of the neces- 
sary perturbations. We do this for each step in this section. 

4.1 Privacy-preserving range finder 

In this section we present a privacy-preserving algorithm which finds a set of vectors Y whose span 
contains most of the spectrum of a given matrix A. 

Lemma 4.1. The algorithm in Figure 2 satisfies (e, 6)-differential privacy. 

Proof. We argue that outputting Y preserves (£, (5)-differential privacy. That outputting W preserves 
(e, (5)-differential privacy follows from the fact that differential privacy holds under arbitrary post- 
processing. 

Consider any two neighboring matrices A, A' e R"'^" differing in their /'th row, and let Y = AQ. 
and Y' = A'Q. Define e e R" to be e^ = - A^.^. Note that by the definition of neighboring, we 
must have ||e||2 < 1. Observe that for each row j + i, we have = F'.,, and define "2" e R*^ to be 



Figure 1 : Base algorithm for computing a low -rank approximation 




(3) 



In particular, for p = r + I we have 



E\\A-B\\f^ ^.\\A-AA\f 



(4) 
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Input: Matrix A e R™^", target rank r, oversampling pai^ameter p such that r + p min{m, «} 
privacy parameters s,d e (0, 1). 

1 . Let be an « X ^ standard Gaussian matrix where k - p + r. 

2. Compute the nxk measurement matrix Y = AQ.. 

3. Let N ~ N{0,p^)"''"' where p - 2e^i -yJlklogiAk/S) . 

4. Let y - y + A^. 

5. Orthonormalize the columns of Y and let the result be W. 
Output: Orthonormal mx k matrix W. 



Figure 2: Privacy-preserving range finder 

'e = F(,) - Y'^.^ = e^Q.. First, we will give a high-probability bound on \\e\\2- Observe that for each 
j € [k],'ej is distributed Uke a standard Gaussian: 



[=1 



\ 

= N{0, 1) , 



V {=1 



where we used Fact 2.5. Therefore, we have for any ? > 1, by standard Gaussian tail bounds. 



Pr||?,| <2exp^-- 
Taking a union bound over all k coordinates we have: 



Primax|?y| ^ ^/2\og(4k/6) 



5 



In particular, we have except with probability 5/2, 

l^b < ^J2k\og{Ak|5) (5) 

Note that we have set p such that conditioned on Equation 5 (which holds with probability at least 1 - 
512) we have the following by Theorem 2.6: For every set S c R"'x^, Pr {f e 5 } < exp(e) Pr jf' e s]+ 
612. Hence, without any conditioning we can say: 

Prjye^l < exp(£)Pr[r e 5) 

which completes the proof of privacy. ■ 

Theorem 4.2. Let A be an m x n matrix with singular values (T\ > o"2 ^ .... Then, given A and 
valid parameters r, p, s, 3, the algorithm in Figure 2 returns a matrix W such that W satisfies (e, <5)- 
dijferential privacy, and moreover we have the error bound 

1/2 

' (6) 



E ||A - WW^AWf < |i + ^g((r, +p) 
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Proof. Privacy follows from Lemma 4.1. Let us therefore argue the second part of the theorem. Con- 
sider the m X {n + m) matrix 

A = [A I pl,nxm] ^ 

where Imxm is the mxm identity matrix. Let Q' denote a random {m + n)xk Gaussian matrix. Note 
that 

Y ~ A'Q' . 

That is, Y is distributed the same way as Y' = A'Q.'. Here, we're using the fact that pN{0, 1) = 
N{0,p^). 

On the other hand, by Theorem 3. 1 , we know that Y' is a good range for A' in the sense that 

E\\A'-Py,A'y^i^i + -^^' ^^yf. (7) 

Here, cr'. denotes the 7-th largest singular value of A' . 
Claim 4.3. ||A - Py'A\\f < \\A' - PyA'Wp 

Proof. The claim is immediate, because we can obtain A from A' by truncating the last m columns. 
Hence, the approximation error can only decrease. ■ 

Claim 4.4. For all j, we have \a-j - cr'.\ < p . 

Proof. Consider the matrix Aq = [A | 0,„xm] where 0,„xm is the all zeros matrix. Note that 

A' ^ Ao + £■ with E = [O^xn I Phnxm] ■ 
Also, (Tj = o"y(Ao), since we just appended an all zeros matrix. On the other hand, 

ll^lb = Wplmxmh =P. 

Hence, by Weyl's perturbation bound (Theorem 2.9) 

\crj-cT'j\ = \crj{Ao)-(T'j\^\\E\\2=p. 



Combining the previous claims with (7), we have 

E ||A - Py'AWf < E ||A' - Py'A'Wf < |i + J ^J]((ry+p)2 . 

Since Y' and Y are identically distributed, the same claim is true when replacing Py by Pf. Further- 
more, Pf = WW^ and so the claim follows. ■ 

Corollary 4.5. Let A e R"'^" be as in the previous theorem. Assume that m ^ n and run the algorithm 
with p r + I. Then, with probability 99/100, 



||A - WW^AIIf < O 
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Proof. By Markov's inequality and the previous theorem, we have with probability 99/100, 



||A - P^AWf < O 



J>r 



But note that {crj + p)^ - cr^j + 2crjp + < 3crj + 3p^. This is because either crj > p \ and thus 

crj > (Tjp or else p ^ o" in which case p^ > CTjp. The claim follows by using that + b < -\/a + ^/b 
for non-negative a,b ^ 0. ■ 



4.2 Privacy-preserving projections 

In the previous section we showed a privacy-preserving algorithm that finds a small number of or- 
thonormal vectors W such that A is well-approximated by IVIV^A. To obtain a privacy preserving 
low-rank approximation algorithm we still need to show how to carry out the projection step in a 
privacy-preserving fashion. We analyze the error of the projection step in terms of the magnitude of 
the maximum entry of each column of W. This serves to bound the sensitivity of the matrix multipli- 
cation operation. The smaller the entries of W, the smaller the over all error that we incur. 



Input: Matrix A e R'"^", matrix W e R'"^*^ whose columns have norm at most 1, privacy parameters 
e,5e (0, 1). 

1. Let W = [wi I u;2 I ■ ■ • I Wk] and for each / e [k] let a, = ||ro,||oo denote the maximum 
magnitude entry in w;. 

2. Let N be a. random k x n matrix where Nij ~ N{0,a^p^) for / e \k\, j e [n] and 
p - 2e-i ^Jmn{4k/6)ln{2/6). 

3. Compute the matrix B = W(W'^A + N). 
Output: Matrix B of rank k. 

Figure 3: Privacy-preserving projection 



Lemma 4.6. The output B of the algorithm satisfies (s, 5)-differential privacy. 

Proof. We will argue that releasing A + N preserves {s, (5)-differential privacy. That releasing B 
preserves differential privacy follows from the fact that differential privacy does not degrade under 
arbitrary post-processing. Fix any two neighboring matrices A, A' differing in their /'th row. Let 
E = A - A' and let e^ - - A^.^ = £■(,). Recall by the definition of neighboring, ||e||2 < 1, and for 
all other j i, \\Ej\\2 = 0. For any j € [k], consider the j'th row of W^E: 



\\{W^E\fl\\2 - . Wjj ■ e] < ajWeh = aj . 
> (=1 



Hence, by Theorem 2.6, releasing {W^E)(j) + where g ~ N{0,ajp^)" preserves (--^=^==, ^)- 
differential privacy. Finally, we apply Theorem 2.4 to see that releasing each of the k rows of W^E 
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preserves (£',k(6/2k) + 6/2) - (e', <5)-difFerential privacy for: 



+ 2k\ 



^/Sklni2/6) \ ^/Sklni2/6) 



< £ 



as desired. 



Theorem 4.7. The algorithm above returns a matrix B such that B satisfies (e, 6)-difi^erential privacy 
and moreover with probability 99/100, 



||A - B\\f < ||A - IVlV^AIIf + O 



^k^^ajp^n 
' 1=1 



In particular if maxj or, - a, we have with the same probability, 

' ak \og{kl6) yjn 



Proof. 



\\A-B\\f <\\A-WW' A\\f + 



\\A - B\\f = \\A - WiW'^A + N)\\f = \\A - WW^A - WN\\f < \\A - WW'^A\\ + \\WN\\f 



But \\W\\f ^ so that, by Fact 2.8, 

\\WN\\f<\\W\\f\\N\\f= V^-|UV||f. 
On the other hand, by Jensen's inequality and linearity of expectation. 



E 



'J 



i=l 



The claim now follows from Markov's inequality. 



Note that the quantities at are always bounded by 1, since all u;,'s are unit vectors. In the next 
section, we will show that under certain incoherence assumptions, we will have (or wiU be able to 
enforce) the condition that the a, values ai^e bounded significantly below 1 . 



5 Incoherent matrices 

Intuitively speaking, a matrix is incoherent if its left singular vectors have low coiTclation with the 
standai^d basis vectors. There are multiple ways to formalize this intuition. Here, we will work 
with two natural notions of coherence. In both cases we will be able to show that we can find — 
in a privacy-preserving way — projection operators that have small entries. As demonstrated in the 
previous section, this directly leads to improvements over randomized response. 
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5.1 C-coherent matrices 



In this section we work with matrices A in which row norms do not deviate by too much from the 
typical row norm. Another way to look at this condition is that coordinate projections provide little 
spectral information about the matrix A. From this angle the condition we need can be interpreted as 
low coherence in the sense that the singular vectors of A that correspond to large singular values must 
be far from the standard basis. 

Definition 5.1 (C-coherence). We say that a matrix A e R'"^« is C-coherent if 

max lie/ All < C • — — . 

!S[m] 

Note that we have 1 < C < V"J- 

The next lemma shows that sparse vectors have poor correlation with the matrix in the above 
sense. We say a vector w is {-sparse if it has at most { nonzero coordinates. 

Lemma 5.2. Let A e IR™^" be a C-coherent matrix. Let w be an {-sparse unit vector in K". Then, 

ym 

e we can write it as u; = ] 
basis vectors and - 1. Hence, 

||«;^A|| - 



Proof. Since w is f-sparse we can write it as u; = T^Ui ^i^' where ei, . . . ,ef are £ distinct standard 



(=1 



l^aicjA < 2j (y,|kf All < 



i=\ 



In the last step we used the Cauchy-Schwarz inequality and the fact that A is C-coherent. ■ 

Lemma 5.3. Let a > 0. Let A € R'"^" be a C-coherent matrix. Let w e R'" be a unit vector and 
suppose Wa is the vector obtained from w by zeroing all coordinates greater than a. Then, 

wl.A = v/A + e 



where e is a vector of norm 

C\\A\\f 



e < 



Proof. Note that u; - is an ^-sparse vector with { < l/a^. Here we used that u; is a unit vector 
and hence there can be at most 1/a^ coordinates larger than a. The lemma now follows directly from 
Lemma 5.2. ■ 

The next lemma is a straightforward extension of the previous one for the case where we multiply 
A by a matrix W rather than a single vector. 
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Lemma 5.4. Let a > 0. Let A e R^x" be a C-coherent matrix. Let W e JRW'X*: be a matrix whose 
columns have unit length. Suppose Wa is the matrix obtained from W by zeroing all entries greater 
than a. Then, 

WWIa = WW'^A + E 

where E is a matrix of Frobenius norm 

Ck\\A\\F 



\\E\\f < 



a 



Proof. By the previous lemma, we have W^A = W^A + E' , where every row of E' has Euclidean 
norm C||A||F/a y/in. Hence, ||£"||f < C yfk\\A\\f/a ^fm. But then 

WWIa = WW'^A + WE' . 

Put E - WE' and note that \\E\\f < ||£"||f = V^||£"||f . The lemma follows. ■ 

The previous lemma quantifies what happens if we replace WW^A by WWJA. Working with 
for small a instead of W^ will decrease the sensitivity of the computation of WJA. On the other hand, 
by the previous lemma we have an expression for the error resulting from the truncation step. 

5.2 Strong coherence 

Here we introduce and work with the notion of jiQ-coherence which is a standard notion of coherence. 
As we will see in Section 5.3, it is a stronger notion than C-coherence. Consequently, the results we 
will be able to obtain using /^Q-coherence are stronger than our previous results on C-coherence in 
certain aspects. 

Definition 5.5 (jUo-coherence). Let ?7 be an m X r matrix with orthonormal columns and r < «. Recall, 
that Py = UU^ . The fiQ-coherence of U is defined as 

fioiU) - - max WPuejf - - max \\U(j)f . (8) 

r KysSm r Ky<m 

Here, ej denotes the j'-th m-dimensional standard basis vector and U(j) denotes the j-th row of U. 

The HQ-coherence of an m x n matrix A of rank r given in its singular value decomposition VLV^ 
where U e R'^^*" is defined as P-q{U). 

Fact 5.6. 1 < ^H){U) < m 

Proof. Since U is orthonormal, there must always exists a row of square norm rjm. On the other 
hand, no row of U has squared norm larger than r. ■ 

The above notion is used extensively throughout the literature in the context of matrix completion 
and low rank approximation, e.g., in Candes and Recht [CR09], Keshavan et al. [KMOlO], Talwalkar 
and Rostamizadeh [TRIO], Mohii and Talwalkar [MTl 1]. Motivated by the Netflix problem, Candes 
and Tao [CTIO] study matrix completion for matrices satisfying a stronger incoherence assumption 
than small //Q-coherence. 

Our goal from here on is to show that if we run our range finding algorithm from Section 4.1 on 
a low-coherence matrix it will produce a projection matrix with small entries. This result (presented 
in Lemma 5. 1 1) requires several technical lemmas. 

The first technical step is a lemma showing that vectors that lie in the range of an incoherent 
matrix must have small ^'oo-norm. 
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Lemma 5.7. Let U be an orthonormal mx r matrix. Suppose w e range(?7) and \\w\\ - 1. Then, 



\\w\\i ^-.fioiU). 



m 



Proof. Let denote the columns of U. By our set of assumptions, there exist cci, . . . , € R 

such that 



w - ^ aiUi and ^ aj = I . 

!=1 I 

Therefore, denoting the j-th entry of w by wj and the j-th entry of by Uij, we have 

\w/ = ^ aiUij < X '^M 2 "oj Cauchy-Schwarz) 

vi=i / Vi=i yv,=i J 

= \\UuM'. 

In particular ||«;||^ < maxyg[,„] ||?7(y)||". On the other hand, using Definition 5.5, 



INiL <max||[/(,-)||2 = -.^o(U). 

je[m] m 



r 



The lemma follows. 



We will need the following geometric lemma: If we start with a small orthonormal set of vectors 
of low coherence and we append few random unit vectors, then the span of the resulting set of vectors 
has a low coherence basis. 

Lemma 5.8. Let u\,. . .,Ur e R'" be orthonormal vectors. Pick unit vectors ni, . . . ,nt e S"'~^ uni- 
formly at random. Assume that 

m > CQk(r + k) log(r + k) (9) 

where cq is a sufficiently large constant. Then, there exists a set of orthonormal vectors vi,. . v,-+k e 
W" such that span{vi, . . . , v,-+k} = span{Mi, . . . , m^, «!, ■ ■ ■ , W/tl and furthermore, with probability 99/100, 

jklogm 

l^0{[Vl I • • • I Vr+k]) < 2yUo([Ml I • • • I Uk]) + O' 



Proof. We will construct the basis iteratively using the Gram-Schmidt orthonormalization algorithm 
starting with the partial orthonormal basis wi, . . . , m^- The algorithm works as follows: At iteration / 
we have obtained a partial orthonormal basis vi, . . .,Vt where t - r + i - \. We then pick a random 
unit vector v e S™"' and let v' - Y!i=\_ Vivjv. Put 



V - V 



\\v-v'\\ 

Let Vt = [vi \ ■ ■ ■ \ Vt] and Vt+i = [V, | Vt+i]. Our goal is to bound ||y/+ill^ as this will directly lead to 
a bound on i^oiVt+i) in terms of Vf. Summing up this bound over t will lead to a bound on fioiVr+k) 
which is what the lemma is asking for. 

Let us start with a two simple claims that follow from measure concentration on the sphere. The 
first one bounds the ^oo-norm of a random unit vector. 
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Claim 5.9. < 0[^) with probability 1 - l/200)t. 

Proof. It is not hard to show that for every / e [m], the coordinate projection fi(v) = vi is a Lips- 
chitz function on the sphere. Moreover, the median of fi is by spherical symmetry. By measure 
concentration (Theorem B.l), Prd/I > ej < C?(exp(-£^m/2)). Setting e = 0{ ^/(logrnT\oglc)fm) = 
0{ ^Jlog{m) /m) and taking a union bound over all m coordinates completes the proof. ■ 

The second claim we need bounds the Euclidean norm of v'. 
Claim 5.10. < o(^) with probability 1 - 1/200A:. 

Proof. Proceeding as in proof of the previous claim, we note that for each / e [t], fi(v) = {vi, v) is a Lip- 
schitz function on the sphere with median 0. Applying Theorem B. 1 with s - 0{ -y^(log7nogTj7m), 
it follows that with probability 1 - 1 /lOOkt, 

\ m 

Taking a union bound over all / e [t] we have with probability 1 - 1 /200k, 

. g^,,.,,>2 ^ ^I^Oog^^J ^ ^|(r + /:)log(r + ^))j 

where we used that t r + k. ■ 

On the one hand, note that v' is in the span of . . . , by definition. Hence, Lemma 5.7 directly 
implies that 

\\v'\\i<--\\v'\\^-f^oiVt). (10) 
m 

Hence, combining Equation 10 with Claim 5.10, we have with probability 1 - 1/200^, 

M /m2 ^ ^( t{r + k)log{r + k) \ 

llt^lL^Ol — fio{Vt)\ . (11) 

On the other hand, we can bound ||^ as follows: 

„ ,,2 _ II" - v'Wl ^ Mt + 2\\v\U\v'\\oo + \\v'\t ^ XMt + 11^11^) 

= ^ HiT^^ ^ II" -"'IP • 

By Claim 5. 10 we have that with probabihty 1 - l/200yt, 

II" - vf = Mf + wv'f - 2{v, iwv'f > 1 - o(^ii^^}}2^:^ 

\ m 

In the first inequality above we used that {v, v') = Y/i=i{vi' ")^ = ll"'lP- We then applied Claim 5.10 in 
the second inequality. By Equation 9, m is sufficiently large so that 

^ <(9(1). (12) 



\\v-v'\\^ 
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Combining Equation 11 with Equation 12 and applying Claim 5.9, we conclude that with with 
probability at least 1 - 1/lOOA:, 

\\vt+i\L < 01 + 2 /^o(V/) 

But when the above bound on ||^ holds, then we must have 

m 9 / ( (r + k)loe,(r + k)\\ /logm\ 
MVt^l) < MVr) + j—^\\vt+l\\i < 11 +0\^ '-^ -jUoiVr) + (13) 

Taking a union bound over all k steps, we find that with probability 99/100, Equation 13 is true at all 
steps of the Gram-Schmidt algorithm. Assuming that this event occurs, we have: 

/ ({r + k)log{r + k)W'' IkXogm 

MVr+k) < I 1 +01^ '-^ '-\\ floiVr) + 01—^ 

/ k log m \ 

< l/doiVr) + O I I (since m » k(r + k) log(r + k) by Equation 9) 

This finishes our proof of the lemma since jJ-oiVr) = jUo([mi I ■ ■ ■ I Ur]) by definition. ■ 

The choice of failure probability in the previous lemma was rather arbitrary and stronger bounds 
can be achieved. We finally arrive at the main lemma in this section. 

Lemma 5.11. Let A be an m xn matrix of rank r. Let Q. ~ N{0, l)"^^ with k ^ r denote a random 
standard Gaussian matrix and define Y = AQ.. Assume that m > c okr log r for sufficiently large 
constant cq. Further, let cr > and N ~ N{0,cr^)"^^'^ denote a random Gaussian matrix with i.i.d. 
entries sampled from N{0, cr^). Put Y - AQ. + N and let w\, . . . ,Wk be an orthonormal basis for the 
range off. Then, with probability 99/100, 



4r 

max||u;,||oo < a/ P(){A) + O 



klogm 



m 



ie[k] V m 

Proof. Let U denote the left singular factor of A. Let ui,. . . ,Ur denote the columns of U. We have, 

span({wi, . . . , Wk}) = range(F), 

since wi, ...,«;<: is an orthonormal basis for the range of Y by construction. On the other hand, 

range(y) c range(A) = span({Mi, . . . , Ur}) . 

Since F = F + A'^ this implies that range(F) c span{Mi, ... , where ni, n^- are the 
columns of A'^ normalized such that = 1. By assumption m is large enough so that we can apply 
Lemma 5.8. Thus we obtain orthonormal vectors vi, . . . , v,-+k satisfying 

range(F) c span{f;i, . . . , Vr+k\ 

and the matrix V whose columns are . . . , Vr+k has coherence 

I klogm 
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with probability 99/100. In particular, Wi e range(y) for all / e {k\ . Therefore, by Lemma 5.7, we 
have that 



max||w;||^ < 

ie[k\ m 



l{r + k) 



m 



(r + k)k log m 



rm 



Since k and ^q{A) - fio{U), we conclude that 



ie[k] 



4r 

max||u;,||oo < a/ HoiA) + O 



m 



k log m 



m 



The lemma follows. 



Remark 5.12. We remark that the previous lemma is essentially tight. Indeed, under the given as- 
sumption on A there could be a left singular vector of (oo-norm yJrfiQ{A)/m. The above lemma implies 
that we are never more than a 0{ yjlog m)-factor away from this bound. 



5.3 Relation between C-coherence and jUo-coherence 

Here we show that the assumption of small //o-coherence is strictly stronger than that of small C- 
coherence assuming the rank of the matrix is not too large. 

Lemma 5.13. Let A be an mxn matrix of rank r. Then, A is C -coherent where 

C < ^JrfioiA) . 

Proof. Let the SVD of A be VLV^ and denote the right singular vectors by . . . , v,-. Extend them 
ai^bitrarily to an orthonormal basis of R", denoted vi,. . .,v„. We then have for every j e [m], 

n r / r \2 

\\e]At = Y,{e]A,Vif = Y,(<^i{ej,Ui)f ^ Y,h{ej,Ui)\\ ' (14) 

i=\ i=\ \i=\ ' 

where we used that the ^j-norm of a vector is bounded by the ^^-norm. On the other hand, 



V(=l 



.!=1 



.,= 1 / V;=l 



(15) 



where we used Cauchy-Schwarz in the inequality. It follows that 



max||4A||2-||A||2max||[/y|p = 

je{m\ J ;e[m] 

Taking square roots on both sides and rearranging, we find 

^/m 



2 r/io(A) 

F 

m 



maxIkyAII < yJrfioiA) . 



||A||f je[m 

Note that the left hand side is exactly the smallest C for which A is C-coherent. This proves the 
lemma. ■ 

Recall that Lemma 5.2 showed that the singular vectors corresponding to large singular values of 
a C-coherent matrix A cannot be too sparse. In particular, the top singular vectors must have small 
jU()-coherence as a result. However, we cannot rule out that there are singular vectors con^esponding 
to small singular values that do have large coordinates. 
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6 Privacy-preserving low rank approximations 



In this section we compose the range finder, projection and truncation step to get a private low rank 
approximation algorithm suitable for matrices of low coherence. 



Input: Matrix A e R'"^", target rank r ^ 2, oversampling pai^ameter p ^ 2, pruning parameter 
a > 0, privacy parameters e,6 € (0, 1). 

1 . Range finder: Run the range finder (Figure 2) on A with sampling parameter k = p + r and 
privacy parameters (e/2, 6/2). Let the output be denoted by W. 

2. Pruning: Let W be the matrix obtained from W by zeroing out all entries larger than a. 

3. Projection: Run the projection algorithm (Figure 3) on input A, W and privacy parameters 
(e/2, 6/2). Let B denote the output of the projection algorithm. 

Output: Matrix B of rank k = (r + p). 



Figure 4: The private find and project algorithm (PFP) for computing privacy-preserving low-rank approxima- 
tions 



Lemma 6.1. The PFP algorithm satisfies (s, 6)-dijferential privacy. 

Proof. This follows directly from composition and the privacy guarantee achieved by the subroutines. 

■ 

The next theorem details the performance of PFP on C-coherent matrices. In particular, it shows 
that in a natural range of parameters it improves significantly over randomized response (input pertur- 
bation). 

Theorem 6.2 (Approximation for C-coherent matrices). There is an (e, 6)-differentially private algo- 
rithm that given a C-coherent matrix A € R'"^" and parameters r ^ 2, p ^ 2 produces a rank k = r-\-p 
matrix B such that with probability 9/10, 



1 + - 

P 



- 1 e \mi E^i^ 



1/2 A 



l|A - Sllf < O 

In particular, the second error term is o ^ ^jkn\og{k / 6) / , whenever 



(16) 



Cfe||A||^Vbi(fc7j) , . 

m - o{n) and — = o\ym] . (17) 

yn ^ ' 

We generally think of C,k as small compared to both m and n. Equation 17 states that the al- 
gorithm outperforms randomized response whenever m is not too large compared to n and not too 
small compared to the rank k, the Frobenius norm of A divided by ^Jn, and the coherence parameter 
C. These two conditions are naturally satisfied for a wide range of parameters. For example, when 
\\A\\f = Oi^'Jkn) (so that randomized response no longer provides non-trivial error) and C = 0(1) 
(i.e., the matrix is very incoherent), then the requirement on m is just that 

aj{k' ) < m < o{n) . 
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The proof of Theorem 6.2 is a straightforward combination of our previous error bounds for range 
finding, pruning and projection. 



Proof of Theorem 6.2. We run PFP with the given set of parameters r, p, s, S and a suitable choice of 
the pruning parameter a > 0. Before fixing a, we claim that the error of the algorithm satisfies, with 
probability 9/10, 

IIA - fill, < O I . IIA - A,.y + ^ UVi^r^ak^). '^^] 

Here, the first term follows from Theorem 3.1 and an application of Markov's inequality to argue 
that the bound holds except with sufficiently small constant probability. The other terms follow from 
Theorem 4.7 (error bound of the projection algorithm). Corollary 4.5 (error bound of the range finder), 
and. Lemma 5.4 (error bound for the pruning step with parameter a). We can now optimize a so as 
to achieve the geometric mean between the two terms that it appears in (as a and 1 1 a). Running PFP 
with this choice of a directly results in the error bound stated in Equation 16. Equation 17 is now 
easily verified by equating the C?(-)-term in Equation 16 with o( yJknlog{k/6)/£) and rearranging. 

Since all sub-routines fail with probability at most 1 / 100, we can take a union bound to conclude 
that the algorithm fails to satisfy the eiTor bound with probability at most 1/10. ■ 

We will next analyze the performance of PFP on yUo-incoherent matrices. In this case no truncation 
is necessary, since we argued that the projection matrix with high probability already has very small 
entries. The error bound here is stronger in certain aspects as we will discuss in a moment. 

Theorem 6.3 (Approximation for yt/Q-coherent matrices). There is an (e, 6)-dijferentiaUy private algo- 
rithm that given a rank R matrix A e R™^" and parameters r ^ 2, p ^ 2 such that k - r + p 4, R and 
m ^ co(RklogR) produces a rank k matrix B such that with probability 9/10, 



IIA - B\\f < O 



IIA 



. A.II. . ( . ^kRMA)^jHogm _ log|M) ^ 



(18) 



[^p-l 

In particular, the error is o (^kn\og{kl6)le) , whenever 

m = o{n) and Rk(jiQ{A) + log m) ^J\og{k|5) - o{m) . (19) 

Just as in the previous theorem we get a range for m in which the algorithm improves over ran- 
domized response. Here, we need the coherence of A to be small compai^ed to m. We also observe 
a dependence on the rank of the matrix. This means the algorithm presents no improvement if the 
matrix is close to being full rank. Recall that /loiA) can be as small as 0(1). In particular, in the 
natural case where /loiA), k, R all are small compared to m, e.g., m"-^, the requirement in Equation 19 
reduces to m = o{n). 

Note that Theorem 6.3 is quantitatively stronger than Theorem 6.2 in the following regime: When 
k,R,C,HQ{A) are all small (e.g., n"^^^), m < ^fii and ||A||^ ^ n, then Theorem 6.3 improves over ran- 
domized response by a factor of roughly ^Jm, whereas Theorem 6.2 achieves an wi^ ''^-factor improve- 
ment. 
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Proof of Theorem 6.3. We run PFP with the given set of parameters r, p, s, 6 and a - \. Note that this 
choice of a implies that we never modify the matrix returned by the range finder. We claim that the 
error of the algorithm is with probability 9/10, 

i f. ~ u. . u ( rr- lkRnQ{A) + P logm r—] log(/t/(5)^ 

\\A-B\\F<o\^\ + j-j-\\A-Ar\\F+\^+ y ^ ' ^ — ^v^j- ' j , 

which is what we stated in the theorem. The first error term follows as before from Theorem 3.1 and 
Markov's inequality so that it holds with probability 99/100. The term of 0{ yJkmlog{k/6)/£) follows 
from Corollary 4.5. To understand the remaining terms that by Lemma 5.1 1 we have that the matrix 
W = [wi I • • • I Wfc] returned by the range finder satisfies with probability 99/100, 



a = max 1 1 I loo < a/ jUo(^) + O 

In applying Lemma 5. 11 we needed that m > co^^logT? for sufficiently large constant which is 
satisfied by our assumption. Hence, Theorem 4.7 ensures that the error resulting from the projection 
operation is at most 0(ak ^nlog{k/6)/£). Expanding a in the latter bound gives the stated error term. 
Equation 17 is now easily verified by equating the 0(-)-term in Equation 18 with o( ^Jknlog{k/d)/£) 
and rean^anging. 

Again, we can take a union bound over the failure probabilities of the sub-routines to bound the 
probability that oui^ algorithm fails to satisfy the stated bound by 1/10. ■ 



k log m 



m 
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A Privacy of the Gaussian Mechanism 



Theorem A.l (Gaussian Mechanism). Let ;c, i/ e R'^ be any two vectors such that \\x - y\\2 < c. Let 
Y e R"^' be an independent random draw from N{0,p^)^, where p = cs^^ -y^log 1.25/5. Then for any 
S c R''.- 

Pr[x + Y eS]< exp(e)Pr[i/ + Y eS]+d 

Proof. For a set 5 c R"^', write S - xto denote the set {5' - : 5 € 5 ) and 5 2 to denote {sQ : s e S}. 
Write S i = {si : s e S] to denote the projection of the set onto the fth coordinate of its elements. 

First we consider the one dimensional case, where x,y e R and \\x - y\\2 = \x - y\ c. Without 

2 

loss of generality, we may take x - and y - c. Let T Q S be the set T = e 5 • z < ^ - f 1 First, 
we argue that Pr[x + YeS\T]= Pr[Y € S \ T] <: 6. This follows directly from the tail bound: 



Pr[F >t]^ 



-^exp(-f2/2p2) 
V2^ 



Observing that: 



2 

Pr[Y eS\T]^ Pr[F > — - ^] 

c 2 



and plugging in our choice of p = ce"' -y/log 1.25/5 completes the claim. Next we show that condi- 
tioned on the event that F ^ S \ T, we have: Pr[.T + F e 5] < exp(e) Pr[y + Y e S]. Conditioned on 
this event we have: 



In 



_Pr[Ye_S]_ 
Pr[F eS - c 



< max 



In 



Pr[y - z] 
Pr[F = z-c] 



In 



exp(-z2/2p^) 
exp(-(z + c)2/2p2) 



where here Pr[F = t] denotes the probability density function of N{0,p^) at t. This quantity is 
bounded by e whenever: 

2 

P E C 

z< — -- 
c 2 

i.e. whenever z^T. Therefore: 



Vx{x + F e 5] < exp(e) Pr[i/ + Y eS]+6 

which completes the proof in the 1 -dimensional case. 

For the multi-dimensional case, we will take advantage of the rotational invariance of the Gaus- 
sian distribution to rotate any Euclidean length c-perturbation into a length c standard basis vector, 
reducing it to the 1 -dimensional case. 

Consider any two vectors x,y e R"^ such that \\x - y\\2 < c. Let Q e R^^^ be the orthonormal 
(rotation) matrix such that {x - y)Q - c' ■ e\ where e\ € R'' is the 1st standard basis vector ei - 
(1, 0, . . . , 0), and c' - \\x - y\\2 < c. We will use the fact that for any orthonormal matrix Q, and for 
any Y ~ N(0,p^y^, YQ ~ N{0,p^Y: i.e. spherically symmetric Gaussian distributions are invariant 
under rotation. We have: 



Pr[;c F e 5] - Pr[(;c + Y)QeSQ\ = Vx{xQ + YQ€SQ\^ Pr[F eSQ-xQ\ 
We want to bound: 



In 



Vx[YeSQ-xQ\ 
Vx{YeSQ-yQ\ 
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Now note that we have chosen Q such that (SQ - xQ)i - (SQ - yQ)i for all / > 1 (Because {xQ)i = 
iyQ)i for all y > 1). Therefore, we have: 



Pr[Y eSQ-xQ] 
" Pv[YeSQ-yQ] 



Note that by rotational invariance, we have: Pr[(z2)i ^ f] = Pr[zi > t] for any vector z € R'^, and 
so we are now again in the 1 -dimensional case, in which the theorem is already proven. 



B Measure concentration on the sphere 

In Section 5 we used the following classical result regarding concentration of Lipschitz functions on 
the sphere. A proof can be found for example in Matousek's text book [Mat02]. 

Theorem B.l (Levy's lemma). Let f : S'^'~' — > R Z>e c? Lipschitz function in the sense that 

\f{x)-fiy)\^\\x-y\\2 

and define the median of f as med(/) = sup |f e R: Pr . Then, 

Pr {1/ - med(/)| > e) < A&x^{-E^dll) , 

where probability probability and expectation are computed with respect to the uniform measure on 
the sphere. 

C The Netflix Data 

In this section we illustrate why the data set released by Netflix satisfies the assumptions underlying 
Theorem 1.1. That is, the matrix is unbalanced, sparse and C-coherent (Definition 5.1) for very small 
C. Indeed, according to information released by Netflix, the data set has the following properties: 

1. There are a: = 100,480,507 movie ratings, m - 17,770 movies and n - 480, 189 users. In 
particular, the data set is very spai^se in that only a ximn w 0.011 fraction of the matrix is 
nonzero. Also note that m ^ n. 

2. The most rated movie in the data set is Miss Congeniality with t = 221 , 715 ratings (followed 
by Independence Day with 216,233). Hence, the maximum number of entries in one row is 
only a tjx w 0.0022 fraction of the total number of nonzero entries. Moreover, all entries of the 
matrix are in {1, . . . , 5) and thus very small numbers. 

We conclude that, indeed, the Netflix matrix is sparse and the maximum norm of any row takes up 
only a tiny fraction of the total norm of the matrix. We further believe that these properties are likely 
to hold in other recommender systems. Indeed, the average number of ratings per user should be 
small (thus resulting in a sparse matrix), and no item should be rated almost as often as all other items 
taken together (thus resulting in low coherence). 
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